The annotate_raw notebook was used to annotate the bad time segments and the bad channels. It reads in the filtered raw object of a subject to make the annotations in the interactive MNE cleaning window via raw.plot. Finally, the annotated bad time segments and bad channels can be written into the designated files manual-badSegments.tsv and manual-badChannels.tsv of the subject. The annotations of these files are added to the raw object in the pipeline step prepare_raw if the user wants to preprocess the subject manually. The annotated bad time segments are considered e.g. in the generation of epochs. With the parameter setting reject_by_annotation=True, the epoching function excludes all time spans for which a BAD_ annotation exists.

The BAD_ annotations were actually created by clicking-and-dragging the mouse in the annotation mode of the interactive MNE plot window. The data of the raw object was explored over the entire recording time and across all channels. For example, the complete beginning of the recording was always annotated as BAD_, because the electrodes need a certain time to get to equilibrium. Artefacts in EEG come from blinks, eye movements and muscle contractions. They have to be removed to increase the certainty that the results obtained later are indeed due to brain activity. The EOG signals gave an indication of where the subject blinked or moved his eyes. Corresponding time spans were annotated. It was also considered to which point in the experiment the examined data belonged. Cleaning was less restrictive at the times when the stimuli were presented in order not to exclude relevant brain activity from further processing.

In addition to cleaning in time by annotating the bad time segments, cleaning in space could also have been done via the interactive MNE plot window. A source in the brain spreads to many electrodes, there is a lot of redundancy and a high correlation between the channels. In some recordings it happens that an electrode is not correctly placed on the head of the subject, that it moves or is just broken. Visual inspection of the raw data of all three subjects did not reveal any noisy electrode. No noisy data was seen in any of the channels. Therefore, it was considered sufficient to limit cleaning via 101_annotate_raw.ipynb to the bad time segments. As a result, the files manual-badChannels.tsv are empty.
The script 102_label_badComponents.py is also not part of the pipeline. It applies the ICA decomposition generated via the pipeline to the prepared raw of the three subjects 002, 003, 004. For each subject, different plots of the ICA solution are provided to manually identify the artifactual ICs based on these plots. The numbers of the ICs are finally recorded in a tsv file. The tsv file is obtained from the pipeline in the apply_ica step when the user requests manual preprocessing for a subject. In the following, the ICA solution of Subject 2 is used to explain which criteria were used to classify the ICs. The two sources [Source 2] and [Source 3] gave assistance in interpreting the plots. Analogous to Subject 002, the ICA artifact rejection was performed for Subjects 003 and 004.
The most important information for the evaluation of the individual ICs was the plot of their properties. For example using this plot, IC 4 can be identified as a brain component for Subject 002. Its scalp topography indicates that the component can be traced back to a single dipole in the brain. The power spectrum also supports this assumption, as a peak can be seen at 10 Hz. Brain components tend to show repeating patterns at certain frequencies, leading to a peak in the power spectrum. In particular, these peaks are observed at a frequency of 10 Hz [Source 2]. The segment image of IC 4 is based on segmented data of the entire raw object. As expected, no time-locked activity can be detected in this segmented continuous data. However, the brain activity is relatively uniform across many segments. No individual highlights emerge. The plot of the component's variance over time across channels also supports the assumption that it captures brain activity. Compared to other components, IC 4 shows a similar and low variance across the segments.

On the other hand ICs 0 and 11 were identified as eye components. The source of IC 0 is assumed to be blinks of the subject, IC 11 refers to horizontal eye movements of the subject. For both components, the scalp topography visualizes that the components have a strong effect on the electrodes placed close to the eyes. The plot thus indicates an origin of the source near the eyes. In the scalp topography of IC 11, maxima with opposite polarity occur at the front left and right. This is typical for horizontal eye movements of the subject. In the PSD, both ICs lack the peak at 10 Hz that is characteristic of a brain component. Instead, blink artefacts appear in a peak at the low frequency end of the spectrum. Especially IC 0 shows clear evidence of blinks. These show up in the image segment as relatively short stripe. They occur from time to time, but not evenly distributed and not in every segment. Another hint that the components are to be classified as artifacts is the plot of their variance. Most segments have a low variance. However, viewed over time, individual segments repeatedly have a very high variance. This indicates noise in the data.

After inspecting the IC property plots, the two ICs 0 and 11 were listed as bad components in the code of label_badComponents.py. Muscle components could not be detected in the ICA decomposition of Subject 002. Since their source is not within the brain, they would appear very flat on the scalp topography, i.e. relatively concentrated in small regions. In addition, increased high frequency activity is characteristic of muscle components. Neither the scalp topography plots nor the PSDs of the ICs of Subject 002 clearly indicate a muscle component. Therefore no further IC was marked for exclusion. Overall, the marking of bad components was based on the idea of marking as few as possible and only clearly artefactual ICs for exclusion. Otherwise, actual brain activity could be lost through cleaning.

The overlay plot was used to assess how well the cleaning worked by excluding the artifactual ICs. It shows the raw data before and after the ICA artifact rejection. In addition, a cross-channel average is shown. Dipolar sources are cancelled out by the averaging, whereas peaks are an indication of artefacts. The overlay plot of Subject 002 supports the assumption that artefacts were excluded from the signal with ICs 0 and 11. While clear peaks are visible in the signal before cleaning, they disappeared after cleaning.
Method infomax
Fit 500 iterations on raw data (360448 samples)
ICA components 28
Explained variance 100.0 %
Available PCA components 30
Channel types eeg
ICA components marked for exclusion ICA000
ICA011
Original and cleaned signal
Method infomax
Fit 500 iterations on raw data (45056 samples)
ICA components 28
Explained variance 100.0 %
Available PCA components 30
Channel types eeg
ICA components marked for exclusion ICA000
ICA002
ICA021
Original and cleaned signal
Method infomax
Fit 500 iterations on raw data (468992 samples)
ICA components 28
Explained variance 100.0 %
Available PCA components 30
Channel types eeg
ICA components marked for exclusion ICA000
ICA008
ICA010
ICA018
Original and cleaned signal
ICA sources
ICA sources
ICA sources